Automatic Tagging of Compound Verb Groups in Czech Corpora
Identifieur interne : 001C01 ( Main/Exploration ); précédent : 001C00; suivant : 001C02Automatic Tagging of Compound Verb Groups in Czech Corpora
Auteurs : Eva Žá Ková [République tchèque] ; Luboš Popelínsk [République tchèque] ; Miloslav Nepil [République tchèque]Source :
- Lecture Notes in Computer Science [ 0302-9743 ]
Abstract
Abstract: In Czech corpora, compound verb groups are usually tagged in a word-by-word manner. As a consequence, some of the morphological tags of particular components of the verb group loose their original meaning. We present an improved method for automatic synthesis of verb rules. These rules describe all compound verb groups that are frequent in Czech. Using these rules, we can find compound verb groups in unannotated texts with high accuracy. The system for tagging compound verb groups in an annotated corpus that exploits the verb rules is described.
Url:
DOI: 10.1007/3-540-45323-7_20
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001778
- to stream Istex, to step Curation: 001276
- to stream Istex, to step Checkpoint: 001A30
- to stream Main, to step Merge: 001C45
- to stream Main, to step Curation: 001C01
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Automatic Tagging of Compound Verb Groups in Czech Corpora</title>
<author><name sortKey="Za Kova, Eva" sort="Za Kova, Eva" uniqKey="Za Kova E" first="Eva" last="Žá Ková">Eva Žá Ková</name>
</author>
<author><name sortKey="Popelinsk, Lubos" sort="Popelinsk, Lubos" uniqKey="Popelinsk L" first="Luboš" last="Popelínsk">Luboš Popelínsk</name>
</author>
<author><name sortKey="Nepil, Miloslav" sort="Nepil, Miloslav" uniqKey="Nepil M" first="Miloslav" last="Nepil">Miloslav Nepil</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:5986FD9D5B5AAF48236DA0482A5B726C251FF4C2</idno>
<date when="2000" year="2000">2000</date>
<idno type="doi">10.1007/3-540-45323-7_20</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-FZCR209N-S/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001778</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001778</idno>
<idno type="wicri:Area/Istex/Curation">001276</idno>
<idno type="wicri:Area/Istex/Checkpoint">001A30</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">001A30</idno>
<idno type="wicri:doubleKey">0302-9743:2000:Za Kova E:automatic:tagging:of</idno>
<idno type="wicri:Area/Main/Merge">001C45</idno>
<idno type="wicri:Area/Main/Curation">001C01</idno>
<idno type="wicri:Area/Main/Exploration">001C01</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Automatic Tagging of Compound Verb Groups in Czech Corpora</title>
<author><name sortKey="Za Kova, Eva" sort="Za Kova, Eva" uniqKey="Za Kova E" first="Eva" last="Žá Ková">Eva Žá Ková</name>
<affiliation wicri:level="3"><country xml:lang="fr">République tchèque</country>
<wicri:regionArea>NLP Laboratory, Faculty of Informatics, Masaryk University, Botanická 68, CZ-60200, Brno</wicri:regionArea>
<placeName><settlement type="city">Brno</settlement>
<region>Moravie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">République tchèque</country>
</affiliation>
</author>
<author><name sortKey="Popelinsk, Lubos" sort="Popelinsk, Lubos" uniqKey="Popelinsk L" first="Luboš" last="Popelínsk">Luboš Popelínsk</name>
<affiliation wicri:level="3"><country xml:lang="fr">République tchèque</country>
<wicri:regionArea>NLP Laboratory, Faculty of Informatics, Masaryk University, Botanická 68, CZ-60200, Brno</wicri:regionArea>
<placeName><settlement type="city">Brno</settlement>
<region>Moravie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">République tchèque</country>
</affiliation>
</author>
<author><name sortKey="Nepil, Miloslav" sort="Nepil, Miloslav" uniqKey="Nepil M" first="Miloslav" last="Nepil">Miloslav Nepil</name>
<affiliation wicri:level="3"><country xml:lang="fr">République tchèque</country>
<wicri:regionArea>NLP Laboratory, Faculty of Informatics, Masaryk University, Botanická 68, CZ-60200, Brno</wicri:regionArea>
<placeName><settlement type="city">Brno</settlement>
<region>Moravie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">République tchèque</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s" type="main" xml:lang="en">Lecture Notes in Computer Science</title>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In Czech corpora, compound verb groups are usually tagged in a word-by-word manner. As a consequence, some of the morphological tags of particular components of the verb group loose their original meaning. We present an improved method for automatic synthesis of verb rules. These rules describe all compound verb groups that are frequent in Czech. Using these rules, we can find compound verb groups in unannotated texts with high accuracy. The system for tagging compound verb groups in an annotated corpus that exploits the verb rules is described.</div>
</front>
</TEI>
<affiliations><list><country><li>République tchèque</li>
</country>
<region><li>Moravie</li>
</region>
<settlement><li>Brno</li>
</settlement>
</list>
<tree><country name="République tchèque"><region name="Moravie"><name sortKey="Za Kova, Eva" sort="Za Kova, Eva" uniqKey="Za Kova E" first="Eva" last="Žá Ková">Eva Žá Ková</name>
</region>
<name sortKey="Nepil, Miloslav" sort="Nepil, Miloslav" uniqKey="Nepil M" first="Miloslav" last="Nepil">Miloslav Nepil</name>
<name sortKey="Nepil, Miloslav" sort="Nepil, Miloslav" uniqKey="Nepil M" first="Miloslav" last="Nepil">Miloslav Nepil</name>
<name sortKey="Popelinsk, Lubos" sort="Popelinsk, Lubos" uniqKey="Popelinsk L" first="Luboš" last="Popelínsk">Luboš Popelínsk</name>
<name sortKey="Popelinsk, Lubos" sort="Popelinsk, Lubos" uniqKey="Popelinsk L" first="Luboš" last="Popelínsk">Luboš Popelínsk</name>
<name sortKey="Za Kova, Eva" sort="Za Kova, Eva" uniqKey="Za Kova E" first="Eva" last="Žá Ková">Eva Žá Ková</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001C01 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001C01 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Informatique |area= SgmlV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:5986FD9D5B5AAF48236DA0482A5B726C251FF4C2 |texte= Automatic Tagging of Compound Verb Groups in Czech Corpora }}
This area was generated with Dilib version V0.6.33. |